Corpus Linguistics And Linguistic Theory
Author:
Keywords:
G059922N#56765044, Social Sciences, Linguistics, Language & Linguistics, Dutch, elastic net regression, morphosyntactic variation, construction grammar, probabilistic grammar, SYNTAX, 0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences, 2004 Linguistics, Languages & Linguistics, 4704 Linguistics
Abstract:
This article showcases elastic net regression as a means to build fairer models of morphosyntactic variation. Elastic net allows lexical items to appear on the same level as traditional, high-level predictors, enabling fuller models of variation. We apply elastic net regression to 1,296,574 Dutch verbal cluster tokens from the SoNaR corpus, analysing a morphosyntactic alternance in Dutch subordinate clauses. Our results show morphosyntactic preferences among verbs, indicating that semantic effects are indeed at play. Further analysis shows that semantic patterns for either word order exist, though it remains difficult to glean any semantic generalisations. Still, the elastic net technique shows that the inclusion of lexical items as full predictors in a model is useful, as much of the variation left unexplained by high-level predictors can be explained in lexical terms.